In the exercise on R Markdown basics yesterday, you already gained some experience in working with .Rmd files and producing HTML output. This time, we will build on what you have already done and what we covered in the lecture on Advanced R Markdown & LaTeX and generate a PDF report. As many of the tasks in this set of exercises repeat what we did yesterday, we will not provide lengthy explanations (or extended cues) for those parts. Instead, we will focus on the topics/aspects that are new and/or specific to producing PDF output and working with LaTeX in R Markdown. If you need a reminder on some of the basics and examples we have covered before, you can go back to the lecture slides and the previous exercises as well as the solutions for those.

For this exercise and the report we want to produce here, we will look at relationship between trust in institutions and use of news media sources in the early phase of the COVID-19 pandemic using the synthetic data based on the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany. Similar to the previous R Markdown exercise, feel free to change anything you like in the report (formatting, variables, analyses, etc.).

As for the previous R Markdown exercise, you can find an example solution in the solutions folder within the workshop materials: The source file Report2_News_sources.Rmd and the output file Report2_News_sources.pdf. Also as in the previous R Markdown exercise, the R code in the solutions for this exercise should be put into code chunks (with appropriate labels and options) in the .Rmd document.

In addition to the packages we used in/for the previous R Markdown report (tidyverse, knitr, corrr and stargazer) we will use one additional package: equatiomatic, Make sure to install it, if you have not already done so.

1

Create a new R Markdown file in the src folder in your project directory. This time, the output type should be PDF. As before, specify a meaningful title and a subtitle, add an author name, and a date in the YAML header.

2

For this document, we want to make sure that tables and figures and tables are not displayed before the relevant text. By default, figure and table environments in LaTeX are floats. To force forward floating, we can use the flafter LaTeX package. We can specify that in the YAML header.

In addition, we also want to keep the .tex output file.
You can specify extra (package) dependencies as an additional option within the pdf_document output type specification in the YAML header. There, you can also specify another option to keep the .tex file.

3

For our PDF report, we also want to properly cite the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany data set as well as R and the packages we use as it is important to cite data and free and open-source software (FOSS). In the solutions folder in the workshop materials, you can find two .bib files containing BibTeX references for the data set and R (refs.bib) and the packages used in the R Markdown file. Copy the files to the src folder in your project directory and add them as bibliographies to the YAML header.
The YAML key you need here is simply bibliography. If you want to, you can also create the .bib files yourself or edit them (which is, e.g., possible with a simple text editor, such as notepad++ for Windows).

Optional

If you want to, you can also specify a citation style of your choosing. In our solution, we use APA (7th).
You can specify a citation style via the csl key in the YAML header. You can find an extensive list of .csl files in the CSL repository on GitHub. Note that you do not necessarily need to have a local copy of the .csl file. You can also provide a URL for a .csl file hosted online as the value for the csl key in the YAML header of your .Rmd file.

4

The structure of the document should be the same as for the HTML report we generated before. You should also use the same options in the setup chunk and load the required packages (including the additional equatiomatic package) and run (or source) the wrangling code at the beginning of the .Rmd document.

The content of this report should be similar to the previous one, except that we now use different variables and the whole data set. The predictor variables we want to use in this report are some of the variables that measure trust in specific groups and institutions, namely trust_government, trust_who and trust_scientists. This time, we have two target/outcome variables: info_nat_pub_br and info_fb.

We also want to do two other extra things in this report: 1) add page breaks between the Methods and Results section and the correlation and regression results, 2) cite the version of R you use at the beginning of the Methods section, the data set in the Sample (sub)section, and the R packages you use whenever you first use them.
There is a command in R for printing your R version as a string.

5

In the results section, you should first present the descriptive/summary statistics for the trust variables.
As in the previous report, you can use the stargazer package for this. Make sure to specify the correct type of output (and don’t forget to set the chunk option to results = 'asis'). In order to avoid warning messages due to missing (or multiply-used labels), you should specify a label in the stargazer function. Also, unlike for the HTML output, you do not need to add Table 1 etc. to the captions. The numbering and labeling of tables is done automatically by Pandoc in the case of LaTeX/PDF documents.

6

Next, calculate the correlations between the predictor variables and present them in a table using the corr library and knitr::kable().
This time you only need to change the selection (and optionally also the naming) of the variables to be used in the code you used in the previous report.

7

For this report, we want to compute and present two logistic regression models (with the same predictors but different outcome variables). You can name them model_pubbr and model_fb after their outcome variables.

8

Before you present the regression results, in this report we also want to provide the formulas for the logistic regression models: Once in general form for a logistic regression model with three predictors, and then specifically for the two models we compute (with the correct variable names). For the former, we can use LaTeX code (in the text) and for the latter, we can use the equatiomatic package (within a code chunk).

You can find template LaTeX code for the formula for a logistic regression model with 5 predictors here. You can display this formula in inline math mode.

The main function of the equatiomatic package is extract_eq() into which you can feed (or pipe) a model object. If you want to make the output a bit more similar to the general LaTeX formula, you can specify the intercept argument of this function accordingly.

9

As for the previous R Markdown report, also write some short prose for the Discussion section and include the reproducibility information about the R version, packages, and OS you used.
For nicer printing of the session information, you can convert it to LaTeX with the the aptly named function toLatex()

Bonus 1

See if you can move the References section before the Reproducibility information.
There are two options for this: A Pandoc solution and a R Markdown solution.

Bonus 2

Try to automatically create the packages.bib file for the packages you use in this R Markdown report.
The write_bib() function from knitr allows you to create .bib files in R. You can get information about the loaded packages with the .packages() function. Note that the 7 first packages in that list are packages included in base R that you do not need to cite separately if you properly cite R itself.

Finally, as before, knit the document and store the resulting PDF document in the output folder. Then add, commit, and push the files/changes.